Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 21 of 21

Full-Text Articles in Engineering

Effects Of Training Datasets On Both The Extreme Learning Machine And Support Vector Machine For Target Audience Identification On Twitter, Siaw Ling Lo, David Cornforth, Raymond Chiong Dec 2014

Effects Of Training Datasets On Both The Extreme Learning Machine And Support Vector Machine For Target Audience Identification On Twitter, Siaw Ling Lo, David Cornforth, Raymond Chiong

Research Collection School Of Computing and Information Systems

The ability to identify or predict a target audience from the increasingly crowded social space will provide a company some competitive advantage over other companies. In this paper, we analyze various training datasets, which include Twitter contents of an account owner and its list of followers, using features generated in different ways for two machine learning approaches - the Extreme Learning Machine (ELM) and Support Vector Machine (SVM). Various configurations of the ELM and SVM have been evaluated. The results indicate that training datasets using features generated from the owner tweets achieve the best performance, relative to other feature sets. …


Identifying The High-Value Social Audience From Twitter Through Text-Mining Methods, Siaw Ling Lo, David Cornforth, Raymond Chiong Nov 2014

Identifying The High-Value Social Audience From Twitter Through Text-Mining Methods, Siaw Ling Lo, David Cornforth, Raymond Chiong

Research Collection School Of Computing and Information Systems

Doing business on social media has become a common practice for many companies these days. While the contents shared on Twitter and Facebook offer plenty of opportunities to uncover business insights, it remains a challenge to sift through the huge amount of social media data and identify the potential social audience who is highly likely to be interested in a particular company. In this paper, we analyze the Twitter content of an account owner and its list of followers through various text mining methods, which include fuzzy keyword matching, statistical topic modeling and machine learning approaches. We use tweets of …


Vireo @ Trecvid 2014: Instance Search And Semantic Indexing, Wei Zhang, Hao Zhang, Ting Yao, Yijie Lu, Jingjing Chen, Chong-Wah Ngo Nov 2014

Vireo @ Trecvid 2014: Instance Search And Semantic Indexing, Wei Zhang, Hao Zhang, Ting Yao, Yijie Lu, Jingjing Chen, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

This paper summarizes the following two tasks participated by VIREO group: instance search and semantic indexing. We will present our approaches and analyze the results obtained in TRECVID 2014 benchmark evaluation


Organizing Video Search Results To Adapted Semantic Hierarchies For Topic-Based Browsing, Jiajun Wang, Yu-Gang Jiang, Qiang Wang, Kuiyuan Yang, Chong-Wah Ngo Nov 2014

Organizing Video Search Results To Adapted Semantic Hierarchies For Topic-Based Browsing, Jiajun Wang, Yu-Gang Jiang, Qiang Wang, Kuiyuan Yang, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Organizing video search results into semantically structured hierarchies can greatly improve the efficiency of browsing complex query topics. Traditional hierarchical clustering techniques are inadequate since they lack the ability to generate semantically interpretable structures. In this paper, we introduce an approach to organize video search results to an adapted semantic hierarchy. As many hot search topics such as celebrities and famous cities have Wikipedia pages where hierarchical topic structures are available, we start from the Wikipedia hierarchies and adjust the structures according to the characteristics of the returned videos from a search engine. Ordinary clustering based on textual information of …


Click-Through-Based Subspace Learning For Image Search, Yingwei Pan, Ting Yao, Xinmei Tian, Houqiang Li, Chong-Wah Ngo Nov 2014

Click-Through-Based Subspace Learning For Image Search, Yingwei Pan, Ting Yao, Xinmei Tian, Houqiang Li, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

One of the fundamental problems in image search is to rank image documents according to a given textual query. We address two limitations of the existing image search engines in this paper. First, there is no straightforward way of comparing textual keywords with visual image content. Image search engines therefore highly depend on the surrounding texts, which are often noisy or too few to accurately describe the image content. Second, ranking functions are trained on query-image pairs labeled by human labelers, making the annotation intellectually expensive and thus cannot be scaled up. We demonstrate that the above two fundamental challenges …


Dc: Small: Energy-Aware Coordinated Caching In Cluster-Based Storage Systems, Yifeng Zhu Oct 2014

Dc: Small: Energy-Aware Coordinated Caching In Cluster-Based Storage Systems, Yifeng Zhu

University of Maine Office of Research Administration: Grant Reports

The main goal of this project is to improve the performance and energy efficiency of I/O (Input/Output) operations of large-scale cluster computing platforms.

The major activities include:

1) characterize the memory access workloads;
2) investigate the new and emerging new storage and memory devices, such as SSD and PCM, on I/O performance.
(3) study energy-efficient buffer and cache replacement algorithms,
(4) leveraging SSD as a new caching device to improve the energy efficiency and performance of I/O performance


Name-Face Association In Web Videos: A Large-Scale Dataset, Baselines, And Open Issues, Zhi-Neng Chen, Chong-Wah Ngo, Wei Zhang, Juan Cao, Yu-Gang Jiang Sep 2014

Name-Face Association In Web Videos: A Large-Scale Dataset, Baselines, And Open Issues, Zhi-Neng Chen, Chong-Wah Ngo, Wei Zhang, Juan Cao, Yu-Gang Jiang

Research Collection School Of Computing and Information Systems

Associating faces appearing in Web videos with names presented in the surrounding context is an important task in many applications. However, the problem is not well investigated particularly under large-scale realistic scenario, mainly due to the scarcity of dataset constructed in such circumstance. In this paper, we introduce a Web video dataset of celebrities, named WebV-Cele, for name-face association. The dataset consists of 75 073 Internet videos of over 4 000 hours, covering 2 427 celebrities and 649 001 faces. This is, to our knowledge, the most comprehensive dataset for this problem. We describe the details of dataset construction, discuss …


Foss Big Data Storage Solution, Gary L. Jaffe Aug 2014

Foss Big Data Storage Solution, Gary L. Jaffe

STAR Program Research Presentations

Utilizing the AERO Institute as an IT test bed or “sandbox”, a small-agile development team will design, build, and test a data management storage system to support post processing of archived and in-flight data collected with the Piccolo flight control system and Compact Fiber Optic Sensing System (C-FOSS). Both systems are integrated on the APV3 aircraft, a small remote-operated vehicle. Due to the amount of data collected from C-FOSS, a system will be designed to sort and organize large data sets. An open-source database will be explored as a viable solution to manage large data loads and provide multi-cluster system …


Foss Big Data Storage Solution, Nurdeen Salami Aug 2014

Foss Big Data Storage Solution, Nurdeen Salami

STAR Program Research Presentations

NASA projects require a reliable approach to store large volumes of data. Accordingly, it is crucial to adopt a lightweight, reliable, and scalable database. Current NASA databases bear costly license fees with undesirable speed and flexibility. The purpose of utilizing the AERO Institute as an IT test bed, or “Sandbox,” is to design, build, test, and implement software solutions prior to transfer to NASA projects. Cassandra coupled with the Astyanax API is a viable solution for storing big data. Store a minimum of 2GB of C-FOSS data in multiple file formats (.csv, .log, .xml, and .jpg). Use benchmark tests to …


Ultimate Codes: Near-Optimal Mds Array Codes For Raid-6, Zhijie Huang, Hong Jiang, Chong Wang, Ke Zhou, Yuhong Zhao Jul 2014

Ultimate Codes: Near-Optimal Mds Array Codes For Raid-6, Zhijie Huang, Hong Jiang, Chong Wang, Ke Zhou, Yuhong Zhao

CSE Technical Reports

As modern storage systems have grown in size and complexity, RAID-6 is poised to replace RAID-5 as the dominant form of RAID architectures due to its ability to protect against double disk failures. Many excellent erasure codes specially designed for RAID-6 have emerged in recent years. However, all of them have limitations. In this paper, we present a class of near perfect erasure codes for RAID-6, called the Ultimate codes. These codes encode, update and decode either optimally or nearly optimally, regardless of what the code length is. This implies that utilizing these codes we can build highly efficient and …


S-Code: Lowest Density Mds Array Codes For Raid-6, Zhijie Huang, Hong Jiang, Ke Zhou, Yuhong Zhao, Chong Wang Jul 2014

S-Code: Lowest Density Mds Array Codes For Raid-6, Zhijie Huang, Hong Jiang, Ke Zhou, Yuhong Zhao, Chong Wang

CSE Technical Reports

RAID, a storage architecture designed to exploit I/O parallelism and provide data reliability, has been deployed widely in computing systems as a storage building block. In large scale storage systems, in particular, RAID-6 is gradually replacing RAID-5 as the dominant form of disk arrays due to its capability of tolerating concurrent failures of any two disks. MDS (maximum distance separable) array codes are the most popular erasure codes that can be used for implementing RAID-6, since they enable optimal storage efficiency and efficient encoding and decoding algorithms. In this paper, we propose a new class of MDS array codes called …


Click-Through-Based Cross-View Learning For Image Search, Yingwei Pan, Ting Yao, Tao Mei, Houqiang Li, Chong-Wah Ngo, Yong Rui Jul 2014

Click-Through-Based Cross-View Learning For Image Search, Yingwei Pan, Ting Yao, Tao Mei, Houqiang Li, Chong-Wah Ngo, Yong Rui

Research Collection School Of Computing and Information Systems

One of the fundamental problems in image search is to rank image documents according to a given textual query. Existing search engines highly depend on surrounding texts for ranking images, or leverage the query-image pairs annotated by human labelers to train a series of ranking functions. However, there are two major limitations: 1) the surrounding texts are often noisy or too few to accurately describe the image content, and 2) the human annotations are resourcefully expensive and thus cannot be scaled up. We demonstrate in this paper that the above two fundamental challenges can be mitigated by jointly exploring the …


Placing Videos On A Semantic Hierarchy For Search Result Navigation, Song Tan, Yu-Gang Jiang, Chong-Wah Ngo Jun 2014

Placing Videos On A Semantic Hierarchy For Search Result Navigation, Song Tan, Yu-Gang Jiang, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Organizing video search results in a list view is widely adopted by current commercial search engines, which cannot support efficient browsing for complex search topics that have multiple semantic facets. In this article, we propose to organize video search results in a highly structured way. Specifically, videos are placed on a semantic hierarchy that accurately organizes various facets of a given search topic. To pick the most suitable videos for each node of the hierarchy, we define and utilize three important criteria: relevance, uniqueness, and diversity. Extensive evaluations on a large YouTube video dataset demonstrate the effectiveness of our approach.


A Novel, Tag-Based File-System, Aaron Laursen May 2014

A Novel, Tag-Based File-System, Aaron Laursen

Mathematics, Statistics, and Computer Science Honors Projects

For decades, computer use has largely focused on managing and manipulating files-- creating and consuming media, browsing the web, software development, and even, with such systems as UNIX and Plan$9$, direct device access can largely be reduced to locating, creating, reading, and writing files. To facilitate these operations, developers have created a vast assortment of file-systems, each presenting a unique framework underlying nearly everything people do with a computer.

For various reasons, these file-systems have historically represented only incremental improvements and alterations from their predecessors, leaving the basic design and interaction models relatively unchanged. Because of this, most common file-systems …


Computational-Communicative Actions Of Informational Processing, Florentin Smarandache, Stefan Vladutescu Apr 2014

Computational-Communicative Actions Of Informational Processing, Florentin Smarandache, Stefan Vladutescu

Branch Mathematics and Statistics Faculty and Staff Publications

This study is circumscribed to the Information Science. The zetetic aim of research is double: a) to define the concept of action of informational processing and b) to design a taxonomy of actions of informational processing. First, the investigation trays to demonstrate that the computational actions of informational processing or the informational actions are computational-investigative configurations for structuring information: clusters of highly-aggregated operations which are carried out in a unitary manner operate convergently and behave like a unique computational device. From a methodological point of view, they are comprised within the category of analytical instruments for the informational processing of …


A Cris Data Science Investigation Of Scientific Workflows Of Agriculture Big Data And Its Data Curation Elements, Benjamin D. Branch, Peter N. Baker, Jai Xu, Elisa Bertino Mar 2014

A Cris Data Science Investigation Of Scientific Workflows Of Agriculture Big Data And Its Data Curation Elements, Benjamin D. Branch, Peter N. Baker, Jai Xu, Elisa Bertino

Libraries Faculty and Staff Presentations

This joint collaboration between the Purdue Libraries and Cyber Center demonstrates the next generation of computational platforms supporting interdisciplinary collaborative research. Such platforms are necessary for rapid advancements of technology, industry demand and scholarly congruence towards open data, open access, big data and cyber-infrastructure data science training. Our approach will utilize a Discovery Undergraduate Research Investigation effort as a preliminary research means to further joint library and computer science data curation research, tool development and refinement.


Agricultural And Water Harvesting Opportunities In Kenya, Via A Crowd-Sourced, Citizen Science Hybrid Paradigm, Benjamin D. Branch, James Tindall, Rosemary Moki, Peter N. Baker, Jai Xu, Elisa Bertino Mar 2014

Agricultural And Water Harvesting Opportunities In Kenya, Via A Crowd-Sourced, Citizen Science Hybrid Paradigm, Benjamin D. Branch, James Tindall, Rosemary Moki, Peter N. Baker, Jai Xu, Elisa Bertino

Libraries Faculty and Staff Presentations

This is potential joint collaborative research between the Purdue University and citizens of Kenya via the Global Engineering Program in the areas of agriculture in water harvesting. Specifically, in the rural part of Kenya, outside of Nairobi, lives can be greatly impacted. Libraries of tomorrow will have global capacity and responsibility to serve all aspects of global citizenry. Herein is one possible Kenyan example.

In partnership with the Tinmore Institute, an International Food, Water, and Energy Security team of experts, such international collaboration with Purdue's Global Engineering program could be quite successful within the areas of agricultural and water security …


Providing Flexible File-Level Data Filtering For Big Data Analytics, Lei Xu, Ziling Huang, Hong Jiang, Lei Tian, David Swanson Jan 2014

Providing Flexible File-Level Data Filtering For Big Data Analytics, Lei Xu, Ziling Huang, Hong Jiang, Lei Tian, David Swanson

CSE Technical Reports

The enormous amount of big data datasets impose the needs for effective data filtering technique to accelerate the analytics process. We propose a Versatile Searchable File System, VSFS, which provides a transparent, flexible and near real-time file-level data filtering service by searching files directly through the file system. Therefore, big data analytics applications can transparently utilize this filtering service without application modifications. A versatile index scheme is designed to adapt to the exploratory and ad-hoc nature of the big data analytics activities. Moreover, VSFS uses a RAM-based distributed architecture to perform file indexing. The evaluations driven by three real-world analytics …


Evaluation And Analysis Of Distributed Graph-Parallel Processing Frameworks, Yue Zhao, Kenji Yoshigoe, Mengjun Xie, Suijian Zhou, Remzi Seker, Jiang Bian Jan 2014

Evaluation And Analysis Of Distributed Graph-Parallel Processing Frameworks, Yue Zhao, Kenji Yoshigoe, Mengjun Xie, Suijian Zhou, Remzi Seker, Jiang Bian

Publications

A number of graph-parallel processing frameworks have been proposed to address the needs of processing complex and large-scale graph structured datasets in recent years. Although significant performance improvement made by those frameworks were reported, comparative advantages of each of these frameworks over the others have not been fully studied, which impedes the best utilization of those frameworks for a specific graph computing task and setting. In this work, we conducted a comparison study on parallel processing systems for large-scale graph computations in a systematic manner, aiming to reveal the characteristics of those systems in performing common graph algorithms with real-world …


Distance In Matrices And Their Applications To Fuzzy Models And Neutrosophic Models, Florentin Smarandache, W.B. Vasantha Kandasamy, K. Ilanthenral Jan 2014

Distance In Matrices And Their Applications To Fuzzy Models And Neutrosophic Models, Florentin Smarandache, W.B. Vasantha Kandasamy, K. Ilanthenral

Branch Mathematics and Statistics Faculty and Staff Publications

In this book authors for the first time introduce the notion of distance between any two m  n matrices. If the distance is 0 or m  n there is nothing interesting. When the distance happens to be a value t; 0 < t < m  n the study is both innovating and interesting. The three cases of study which is carried out in this book are 1. If the difference between two square matrices is large, will it imply the eigen values and eigen vectors of those matrices are distinct? Several open conjectures in this direction are given. 2. The difference between parity check matrix and the generator matrix for the same C(n, k) code is studied. This will help in detecting errors in storage systems as well as in cryptography.


On The Security Of Auditing Mechanisms For Secure Cloud Storage, Yong Yu, Lei Niu, Guomin Yang, Yi Mu, Willy Susilo Jan 2014

On The Security Of Auditing Mechanisms For Secure Cloud Storage, Yong Yu, Lei Niu, Guomin Yang, Yi Mu, Willy Susilo

Research Collection School Of Computing and Information Systems

Cloud computing is a novel computing model that enables convenient and on-demand access to a shared pool of configurable computing resources. Auditing services are highly essential to make sure that the data is correctly hosted in the cloud. In this paper, we investigate the active adversary attacks in three auditing mechanisms for shared data in the cloud, including two identity privacy-preserving auditing mechanisms called Oruta and Knox, and a distributed storage integrity auditing mechanism.We show that these schemes become insecure when active adversaries are involved in the cloud storage. Specifically, an active adversary can arbitrarily alter the cloud data without …