Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 8 of 8

Full-Text Articles in Physical Sciences and Mathematics

Information Extraction And Classification On Journal Papers, Lei Yu Nov 2021

Information Extraction And Classification On Journal Papers, Lei Yu

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

The importance of journals for diffusing the results of scientific research has increased considerably. In the digital era, Portable Document Format (PDF) became the established format of electronic journal articles. This structured form, combined with a regular and wide dissemination, spread scientific advancements easily and quickly. However, the rapidly increasing numbers of published scientific articles requires more time and effort on systematic literature reviews, searches and screens. The comprehension and extraction of useful information from the digital documents is also a challenging task, due to the complex structure of PDF.

To help a soil science team from the United States …


New Algorithms For Large Datasets And Distributions, Sutanu Gayen Jul 2019

New Algorithms For Large Datasets And Distributions, Sutanu Gayen

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

In this dissertation, we make progress on certain algorithmic problems broadly over two computational models: the streaming model for large datasets and the distribution testing model for large probability distributions.

First we consider the streaming model, where a large sequence of data items arrives one by one. The computer needs to make one pass over this sequence, processing every item quickly, in a limited space. In Chapter 2 motivated by a bioinformatics application, we consider the problem of estimating the number of low-frequency items in a stream, which has received only a limited theoretical work so far. We give an …


Pascal's Triangle Modulo N And Its Applications To Efficient Computation Of Binomial Coefficients, Zachary Warneke Mar 2019

Pascal's Triangle Modulo N And Its Applications To Efficient Computation Of Binomial Coefficients, Zachary Warneke

Honors Theses

In this thesis, Pascal's Triangle modulo n will be explored for n prime and n a prime power. Using the results from the case when n is prime, a novel proof of Lucas' Theorem is given. Additionally, using both the results from the exploration of Pascal's Triangle here, as well as previous results, an efficient algorithm for computation of binomial coefficients modulo n (a choose b mod n) is described, and its time complexity is analyzed and compared to naive methods. In particular, the efficient algorithm runs in O(n log(a)) time (as opposed to …


The Global Disinformation Order: 2019 Global Inventory Of Organised Social Media Manipulation, Samantha Bradshaw, Philip N. Howard Jan 2019

The Global Disinformation Order: 2019 Global Inventory Of Organised Social Media Manipulation, Samantha Bradshaw, Philip N. Howard

Copyright, Fair Use, Scholarly Communication, etc.

Executive Summary

Over the past three years, we have monitored the global organization of social media manipulation by governments and political parties. Our 2019 report analyses the trends of computational propaganda and the evolving tools, capacities, strategies, and resources.

1. Evidence of organized social media manipulation campaigns which have taken place in 70 countries, up from 48 countries in 2018 and 28 countries in 2017. In each country, there is at least one political party or government agency using social media to shape public attitudes domestically.

2.Social media has become co-opted by many authoritarian regimes. In 26 countries, computational propaganda …


Application Of Cosine Similarity In Bioinformatics, Srikanth Maturu May 2018

Application Of Cosine Similarity In Bioinformatics, Srikanth Maturu

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Finding similar sequences to an input query sequence (DNA or proteins) from a sequence data set is an important problem in bioinformatics. It provides researchers an intuition of what could be related or how the search space can be reduced for further tasks. An exact brute-force nearest-neighbor algorithm used for this task has complexity O(m * n) where n is the database size and m is the query size. Such an algorithm faces time-complexity issues as the database and query sizes increase. Furthermore, the use of alignment-based similarity measures such as minimum edit distance adds an additional complexity to the …


Decaf: A New Event Detection Logic For The Purpose Of Fusing Delineated-Continuous Spatial Information, Kerry Q. Hart May 2014

Decaf: A New Event Detection Logic For The Purpose Of Fusing Delineated-Continuous Spatial Information, Kerry Q. Hart

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Geospatial information fusion is the process of synthesizing information from complementary data sources located at different points in space and time. Spatial phenomena are often measured at discrete locations by sensor networks, technicians, and volunteers; yet decisions often require information about locations where direct measurements do not exist. Traditional methods assume the spatial phenomena to be either discrete or continuous, an assumption that underlies and informs all subsequent analysis. Yet certain phenomena defy this dichotomy, alternating as they move across spatial and temporal scales. Precipitation, for example, appears continuous at large scales, but it can be temporally decomposed into discrete …


Dynamic Load Balancing For I/O-Intensive Applications On Clusters, Xiao Qin, Hong Jiang, Adam Manzanares, Xiaojun Ruan, Shu Yin Nov 2009

Dynamic Load Balancing For I/O-Intensive Applications On Clusters, Xiao Qin, Hong Jiang, Adam Manzanares, Xiaojun Ruan, Shu Yin

School of Computing: Faculty Publications

Load balancing for clusters has been investigated extensively, mainly focusing on the effective usage of global CPU and memory resources. However, previous CPU- or memory-centric load balancing schemes suffer significant performance drop under I/O-intensive workloads due to the imbalance of I/O load. To solve this problem, we propose two simple yet effective I/O-aware load-balancing schemes for two types of clusters: (1) homogeneous clusters where nodes are identical and (2) heterogeneous clusters, which are comprised of a variety of nodes with different performance characteristics in computing power, memory capacity, and disk speed. In addition to assigning I/O-intensive sequential and parallel jobs …


P-Code: A New Raid-6 Code With Optimal Properties, Chao Jin, Hong Jiang, Dan Feng, Lei Tian Jun 2009

P-Code: A New Raid-6 Code With Optimal Properties, Chao Jin, Hong Jiang, Dan Feng, Lei Tian

CSE Conference and Workshop Papers

RAID-6 significantly outperforms the other RAID levels in disk-failure tolerance due to its ability to tolerate arbitrary two concurrent disk failures in a disk array. The underlying parity array codes have a significant impact on RAID-6’s performance. In this paper, we propose a new XOR-based RAID-6 code, called the Partition Code (P-Code). P-Code is a very simple and flexible vertical code, making it easy to understand and implement. It works on a group of (prime – 1) or (prime) disks, and its coding scheme is based on an equal partition of a specified two-integer-tuple set. P-Code has the following properties: …